Overview

Dataset statistics

Number of variables26
Number of observations28318
Missing cells195
Missing cells (%)< 0.1%
Duplicate rows399
Duplicate rows (%)1.4%
Total size in memory5.8 MiB
Average record size in memory216.0 B

Variable types

CAT12
NUM10
DATE2
BOOL2

Warnings

Dataset has 399 (1.4%) duplicate rows Duplicates
first_name has a high cardinality: 1176 distinct values High cardinality
last_name has a high cardinality: 2722 distinct values High cardinality
allegation has a high cardinality: 82 distinct values High cardinality
precinct_alpha has a high cardinality: 80 distinct values High cardinality
year_received is highly correlated with complaint_idHigh correlation
complaint_id is highly correlated with year_receivedHigh correlation
allegation is highly correlated with fado_typeHigh correlation
fado_type is highly correlated with allegationHigh correlation
complainant_age_incident is highly skewed (γ1 = -124.0719749) Skewed
officer_cumcount has 6779 (23.9%) zeros Zeros

Reproduction

Analysis started2020-11-30 00:21:27.709187
Analysis finished2020-11-30 00:21:49.987933
Duration22.28 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

first_name
Categorical

HIGH CARDINALITY

Distinct1176
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
Michael
 
1431
Christophe
 
845
Joseph
 
769
John
 
717
Daniel
 
643
Other values (1171)
23913 
ValueCountFrequency (%) 
Michael14315.1%
 
Christophe8453.0%
 
Joseph7692.7%
 
John7172.5%
 
Daniel6432.3%
 
Robert5982.1%
 
Brian5311.9%
 
David5271.9%
 
James5261.9%
 
Thomas4631.6%
 
Other values (1166)2126875.1%
 
Frequencies of value counts

Unique

Unique116 ?
Unique (%)0.4%
Histogram of lengths of the category

Length

Max length10
Median length6
Mean length5.985662829
Min length2

last_name
Categorical

HIGH CARDINALITY

Distinct2722
Distinct (%)9.6%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
Rodriguez
 
259
Ortiz
 
171
Rivera
 
167
Martinez
 
153
Morales
 
139
Other values (2717)
27429 
ValueCountFrequency (%) 
Rodriguez2590.9%
 
Ortiz1710.6%
 
Rivera1670.6%
 
Martinez1530.5%
 
Morales1390.5%
 
Perez1390.5%
 
Gonzalez1300.5%
 
Smith1270.4%
 
Martin1270.4%
 
Ramirez1180.4%
 
Other values (2712)2678894.6%
 
Frequencies of value counts

Unique

Unique251 ?
Unique (%)0.9%
Histogram of lengths of the category

Length

Max length18
Median length7
Mean length6.65587259
Min length2

complaint_id
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10497
Distinct (%)37.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean24810.71114
Minimum3158
Maximum43703
Zeros0
Zeros (%)0.0%
Memory size221.2 KiB

Quantile statistics

Minimum3158
5-th percentile6564.75
Q115037
median25794
Q334300.25
95-th percentile41303
Maximum43703
Range40545
Interquartile range (IQR)19263.25

Descriptive statistics

Standard deviation11117.48166
Coefficient of variation (CV)0.4480920195
Kurtosis-1.182591997
Mean24810.71114
Median Absolute Deviation (MAD)9490.5
Skewness-0.1615584346
Sum702589718
Variance123598398.4
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
36901300.1%
 
31072200.1%
 
38927200.1%
 
41986190.1%
 
34557190.1%
 
38071180.1%
 
40678180.1%
 
41779180.1%
 
33162180.1%
 
36313170.1%
 
Other values (10487)2812199.3%
 
ValueCountFrequency (%) 
31581< 0.1%
 
34322< 0.1%
 
34795< 0.1%
 
34911< 0.1%
 
35143< 0.1%
 
ValueCountFrequency (%) 
437031< 0.1%
 
436831< 0.1%
 
436731< 0.1%
 
436381< 0.1%
 
436203< 0.1%
 

month_received
Real number (ℝ≥0)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.312451444
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Memory size221.2 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median6
Q39
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.369308794
Coefficient of variation (CV)0.5337559938
Kurtosis-1.183763925
Mean6.312451444
Median Absolute Deviation (MAD)3
Skewness0.05424331861
Sum178756
Variance11.35224175
MonotocityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%) 
327319.6%
 
825539.0%
 
925228.9%
 
525228.9%
 
424268.6%
 
224028.5%
 
623508.3%
 
123248.2%
 
1022898.1%
 
722688.0%
 
Other values (2)393113.9%
 
ValueCountFrequency (%) 
123248.2%
 
224028.5%
 
327319.6%
 
424268.6%
 
525228.9%
 
ValueCountFrequency (%) 
1218926.7%
 
1120397.2%
 
1022898.1%
 
925228.9%
 
825539.0%
 

year_received
Real number (ℝ≥0)

HIGH CORRELATION

Distinct23
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2011.446395
Minimum1998
Maximum2020
Zeros0
Zeros (%)0.0%
Memory size221.2 KiB

Quantile statistics

Minimum1998
5-th percentile2003
Q12007
median2012
Q32016
95-th percentile2019
Maximum2020
Range22
Interquartile range (IQR)9

Descriptive statistics

Standard deviation4.91397957
Coefficient of variation (CV)0.002443007968
Kurtosis-0.8586098243
Mean2011.446395
Median Absolute Deviation (MAD)4
Skewness-0.3110822189
Sum56960139
Variance24.14719521
MonotocityIncreasing
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%) 
201520827.4%
 
201620217.1%
 
201419707.0%
 
201819356.8%
 
201319346.8%
 
201718386.5%
 
200717366.1%
 
201117286.1%
 
201217056.0%
 
200916055.7%
 
Other values (13)976434.5%
 
ValueCountFrequency (%) 
1998190.1%
 
1999840.3%
 
20001950.7%
 
20013021.1%
 
20024881.7%
 
ValueCountFrequency (%) 
20202< 0.1%
 
201914275.0%
 
201819356.8%
 
201718386.5%
 
201620217.1%
 

mos_ethnicity
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
White
15000 
Hispanic
8033 
Black
4223 
Asian
 
1037
American Indian
 
25
ValueCountFrequency (%) 
White1500053.0%
 
Hispanic803328.4%
 
Black422314.9%
 
Asian10373.7%
 
American Indian250.1%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length15
Median length5
Mean length5.859841797
Min length5

mos_gender
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
M
26768 
F
 
1550
ValueCountFrequency (%) 
M2676894.5%
 
F15505.5%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

mos_age_incident
Real number (ℝ≥0)

Distinct39
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.41507169
Minimum21
Maximum60
Zeros0
Zeros (%)0.0%
Memory size221.2 KiB

Quantile statistics

Minimum21
5-th percentile24
Q128
median32
Q336
95-th percentile44
Maximum60
Range39
Interquartile range (IQR)8

Descriptive statistics

Standard deviation5.973311314
Coefficient of variation (CV)0.1842757397
Kurtosis0.2748606286
Mean32.41507169
Median Absolute Deviation (MAD)4
Skewness0.7187605364
Sum917930
Variance35.68044805
MonotocityNot monotonic
Histogram with fixed size bins (bins=39)
ValueCountFrequency (%) 
3020517.2%
 
2819356.8%
 
3119006.7%
 
2918676.6%
 
3217816.3%
 
2717806.3%
 
3317046.0%
 
3515195.4%
 
3414915.3%
 
2614865.2%
 
Other values (29)1080438.2%
 
ValueCountFrequency (%) 
21160.1%
 
222050.7%
 
234701.7%
 
249953.5%
 
2513864.9%
 
ValueCountFrequency (%) 
602< 0.1%
 
586< 0.1%
 
575< 0.1%
 
565< 0.1%
 
554< 0.1%
 
Distinct6
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
Black
16886 
Hispanic
6332 
White
2749 
Other
1762 
Asian
 
525
ValueCountFrequency (%) 
Black1688659.6%
 
Hispanic633222.4%
 
White27499.7%
 
Other17626.2%
 
Asian5251.9%
 
American Indian640.2%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length15
Median length5
Mean length5.693410552
Min length5
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
Male
23409 
Female
4849 
Other
 
60
ValueCountFrequency (%) 
Male2340982.7%
 
Female484917.1%
 
Other600.2%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length6
Median length4
Mean length4.344586482
Min length4

complainant_age_incident
Real number (ℝ)

SKEWED

Distinct87
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean32.50847517
Minimum-4301
Maximum101
Zeros2
Zeros (%)< 0.1%
Memory size221.2 KiB

Quantile statistics

Minimum-4301
5-th percentile17
Q123
median30
Q341
95-th percentile55
Maximum101
Range4402
Interquartile range (IQR)18

Descriptive statistics

Standard deviation28.50157375
Coefficient of variation (CV)0.8767428678
Kurtosis18873.89805
Mean32.50847517
Median Absolute Deviation (MAD)8
Skewness-124.0719749
Sum920575
Variance812.3397064
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
2611454.0%
 
2411183.9%
 
3010473.7%
 
2310203.6%
 
2510053.5%
 
219453.3%
 
289433.3%
 
279333.3%
 
229253.3%
 
299063.2%
 
Other values (77)1833164.7%
 
ValueCountFrequency (%) 
-43011< 0.1%
 
-15< 0.1%
 
02< 0.1%
 
11< 0.1%
 
21< 0.1%
 
ValueCountFrequency (%) 
1012< 0.1%
 
881< 0.1%
 
874< 0.1%
 
862< 0.1%
 
841< 0.1%
 

fado_type
Categorical

HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
Abuse of Authority
16850 
Force
6685 
Discourtesy
4147 
Offensive Language
 
636
ValueCountFrequency (%) 
Abuse of Authority1685059.5%
 
Force668523.6%
 
Discourtesy414714.6%
 
Offensive Language6362.2%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length18
Median length18
Mean length13.90599619
Min length5

allegation
Categorical

HIGH CARDINALITY
HIGH CORRELATION

Distinct82
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
Physical force
4673 
Word
3747 
Stop
2181 
Search (of person)
1928 
Frisk
1763 
Other values (77)
14026 
ValueCountFrequency (%) 
Physical force467316.5%
 
Word374713.2%
 
Stop21817.7%
 
Search (of person)19286.8%
 
Frisk17636.2%
 
Refusal to provide name/shield number14525.1%
 
Vehicle search13704.8%
 
Threat of arrest12894.6%
 
Vehicle stop10673.8%
 
Threat of force (verbal or physical)8673.1%
 
Other values (72)798128.2%
 
Frequencies of value counts

Unique

Unique5 ?
Unique (%)< 0.1%
Histogram of lengths of the category

Length

Max length40
Median length14
Mean length14.39526096
Min length4

precinct
Real number (ℝ≥0)

Distinct79
Distinct (%)0.3%
Missing19
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean64.22739319
Minimum0
Maximum1000
Zeros2
Zeros (%)< 0.1%
Memory size221.2 KiB

Quantile statistics

Minimum0
5-th percentile14
Q142
median67
Q381
95-th percentile115
Maximum1000
Range1000
Interquartile range (IQR)39

Descriptive statistics

Standard deviation31.85861193
Coefficient of variation (CV)0.4960284133
Kurtosis77.76226724
Mean64.22739319
Median Absolute Deviation (MAD)23
Skewness2.778484333
Sum1817571
Variance1014.971154
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
7517846.3%
 
7310013.5%
 
449913.5%
 
799503.4%
 
469423.3%
 
679063.2%
 
408973.2%
 
477812.8%
 
777742.7%
 
1207472.6%
 
Other values (69)1852665.4%
 
ValueCountFrequency (%) 
02< 0.1%
 
11620.6%
 
51530.5%
 
61580.6%
 
71800.6%
 
ValueCountFrequency (%) 
10003< 0.1%
 
123980.3%
 
1222060.7%
 
1213401.2%
 
1207472.6%
 

contact_reason
Categorical

Distinct13
Distinct (%)< 0.1%
Missing128
Missing (%)0.5%
Memory size221.2 KiB
PD suspected C/V of violation/crime - street
9459 
Other
6687 
PD suspected C/V of violation/crime - auto
2837 
PD suspected C/V of violation/crime - bldg
2121 
Moving violation
1927 
Other values (8)
5159 
ValueCountFrequency (%) 
PD suspected C/V of violation/crime - street945933.4%
 
Other668723.6%
 
PD suspected C/V of violation/crime - auto283710.0%
 
PD suspected C/V of violation/crime - bldg21217.5%
 
Moving violation19276.8%
 
Other violation of VTL10903.8%
 
Report-dispute9233.3%
 
Report of other crime7222.5%
 
Parking violation6032.1%
 
Execution of search warrant5001.8%
 
Other values (3)13214.7%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length58
Median length42
Mean length28.8152765
Min length3
Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
Arrest
12607 
None
10265 
Summons
5348 
Other
 
98
ValueCountFrequency (%) 
Arrest1260744.5%
 
None1026536.2%
 
Summons534818.9%
 
Other980.3%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length7
Median length6
Mean length5.460413871
Min length4
Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
Unsubstantiated
13590 
Exonerated
7748 
Substantiated
6980 
ValueCountFrequency (%) 
Unsubstantiated1359048.0%
 
Exonerated774827.4%
 
Substantiated698024.6%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length15
Median length13
Mean length13.13899287
Min length10
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
1
14728 
0
13590 
ValueCountFrequency (%) 
11472852.0%
 
01359048.0%
 
Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
0
21338 
1
6980 
ValueCountFrequency (%) 
02133875.4%
 
1698024.6%
 
Distinct257
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
Minimum1998-03-01 00:00:00
Maximum2020-01-01 00:00:00
Histogram with fixed size bins (bins=50)
Distinct240
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
Minimum2000-03-01 00:00:00
Maximum2020-06-01 00:00:00
Histogram with fixed size bins (bins=50)

officer_cumcount
Real number (ℝ≥0)

ZEROS

Distinct29
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.873543329
Minimum0
Maximum28
Zeros6779
Zeros (%)23.9%
Memory size221.2 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q34
95-th percentile9
Maximum28
Range28
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.269607464
Coefficient of variation (CV)1.137831273
Kurtosis5.751913661
Mean2.873543329
Median Absolute Deviation (MAD)2
Skewness2.020516489
Sum81373
Variance10.69033297
MonotocityNot monotonic
Histogram with fixed size bins (bins=29)
ValueCountFrequency (%) 
0677923.9%
 
1560119.8%
 
2434415.3%
 
3307310.9%
 
423108.2%
 
516485.8%
 
611744.1%
 
79533.4%
 
86082.1%
 
94841.7%
 
Other values (19)13444.7%
 
ValueCountFrequency (%) 
0677923.9%
 
1560119.8%
 
2434415.3%
 
3307310.9%
 
423108.2%
 
ValueCountFrequency (%) 
282< 0.1%
 
272< 0.1%
 
263< 0.1%
 
254< 0.1%
 
247< 0.1%
 

command_rank_num
Real number (ℝ≥0)

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.594462886
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Memory size221.2 KiB

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q32
95-th percentile3
Maximum7
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.9690046128
Coefficient of variation (CV)0.6077310557
Kurtosis1.484993552
Mean1.594462886
Median Absolute Deviation (MAD)0
Skewness1.476812369
Sum45152
Variance0.9389699396
MonotocityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%) 
11935068.3%
 
3501117.7%
 
226909.5%
 
410513.7%
 
51120.4%
 
61030.4%
 
71< 0.1%
 
ValueCountFrequency (%) 
11935068.3%
 
226909.5%
 
3501117.7%
 
410513.7%
 
51120.4%
 
ValueCountFrequency (%) 
71< 0.1%
 
61030.4%
 
51120.4%
 
410513.7%
 
3501117.7%
 

precinct_alpha
Categorical

HIGH CARDINALITY

Distinct80
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size221.2 KiB
P_75
 
1784
P_73
 
1001
P_44
 
991
P_79
 
950
P_46
 
942
Other values (75)
22650 
ValueCountFrequency (%) 
P_7517846.3%
 
P_7310013.5%
 
P_449913.5%
 
P_799503.4%
 
P_469423.3%
 
P_679063.2%
 
P_408973.2%
 
P_477812.8%
 
P_777742.7%
 
P_1207472.6%
 
Other values (70)1854565.5%
 
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
Histogram of lengths of the category

Length

Max length6
Median length4
Mean length4.158344516
Min length3

percent_unemployed_mean
Real number (ℝ≥0)

Distinct77
Distinct (%)0.3%
Missing24
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean11.67381508
Minimum3.494837819
Maximum100
Zeros0
Zeros (%)0.0%
Memory size221.2 KiB

Quantile statistics

Minimum3.494837819
5-th percentile5.520730327
Q18.624282869
median11.50349865
Q314.01450408
95-th percentile16.74778723
Maximum100
Range96.50516218
Interquartile range (IQR)5.390221206

Descriptive statistics

Standard deviation4.473285897
Coefficient of variation (CV)0.3831897169
Kurtosis65.80121731
Mean11.67381508
Median Absolute Deviation (MAD)2.818408816
Skewness3.966154767
Sum330298.9239
Variance20.01028671
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
11.5034986517846.3%
 
14.0145040810013.5%
 
14.321907479913.5%
 
12.022911359503.4%
 
14.723453189423.3%
 
10.453920589063.2%
 
16.684955038973.2%
 
13.380182547812.8%
 
10.089809367742.7%
 
7.6977318817472.6%
 
Other values (67)1852165.4%
 
ValueCountFrequency (%) 
3.4948378191620.6%
 
3.656371471280.5%
 
3.938528102780.3%
 
4.1706255552971.0%
 
4.300006132180.8%
 
ValueCountFrequency (%) 
10012< 0.1%
 
28.739379324581.6%
 
19.261412455542.0%
 
16.747787235982.1%
 
16.684955038973.2%
 

percent_nohs_mean
Real number (ℝ≥0)

Distinct77
Distinct (%)0.3%
Missing24
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean22.23330075
Minimum1.011353087
Maximum50
Zeros0
Zeros (%)0.0%
Memory size221.2 KiB

Quantile statistics

Minimum1.011353087
5-th percentile6.667612475
Q114.99169451
median21.00517081
Q327.92354956
95-th percentile38.57440058
Maximum50
Range48.98864691
Interquartile range (IQR)12.93185505

Descriptive statistics

Standard deviation9.273705232
Coefficient of variation (CV)0.4171087926
Kurtosis-0.5502450903
Mean22.23330075
Median Absolute Deviation (MAD)6.013476308
Skewness0.1135490422
Sum629069.0115
Variance86.00160874
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
22.7630963817846.3%
 
25.9896847510013.5%
 
35.544366139913.5%
 
19.937814759503.4%
 
34.149992879423.3%
 
14.991694519063.2%
 
38.574400588973.2%
 
19.540247417812.8%
 
16.844064067742.7%
 
13.009943327472.6%
 
Other values (67)1852165.4%
 
ValueCountFrequency (%) 
1.0113530871580.6%
 
1.933804875700.2%
 
2.2251374781280.5%
 
2.745617033780.3%
 
2.8197341321620.6%
 
ValueCountFrequency (%) 
5012< 0.1%
 
41.62357394581.6%
 
39.156841395542.0%
 
38.574400588973.2%
 
35.896992475672.0%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

first_namelast_namecomplaint_idmonth_receivedyear_receivedmos_ethnicitymos_gendermos_age_incidentcomplainant_ethnicitycomplainant_gendercomplainant_age_incidentfado_typeallegationprecinctcontact_reasonoutcome_descriptiondisposition_cleanruling_conduct_occurredruling_conduct_violated_rulesreceived_datetimeclosed_datetimeofficer_cumcountcommand_rank_numprecinct_alphapercent_unemployed_meanpercent_nohs_mean
0KennethCullen315831998WhiteM28BlackMale20.0ForceGun Fired79.0PD suspected C/V of violation/crime - bldgArrestExonerated101998-03-012000-08-0113P_7912.02291119.937815
1MichaelVento3432101998WhiteM32BlackMale32.0ForcePhysical force40.0C/V intervened on behalf of/observed encounter w/3rd partyArrestUnsubstantiated001998-10-012000-05-0101P_4016.68495538.574401
2MichaelVento3432101998WhiteM32BlackMale32.0Abuse of AuthorityFrisk and/or search40.0C/V intervened on behalf of/observed encounter w/3rd partyArrestUnsubstantiated001998-10-012000-05-0101P_4016.68495538.574401
3MichaelCronin3479111998WhiteM30BlackMale47.0ForcePunch/Kick46.0NaNNoneUnsubstantiated001998-11-012000-06-0112P_4614.72345334.149993
4MichaelCronin3479111998WhiteM30BlackMale47.0Abuse of AuthorityFrisk and/or search46.0NaNNoneExonerated101998-11-012000-06-0112P_4614.72345334.149993
5MichaelCronin3479111998WhiteM30BlackMale47.0Abuse of AuthorityPerson Searched46.0NaNNoneUnsubstantiated001998-11-012000-06-0112P_4614.72345334.149993
6MichaelCronin3479111998WhiteM30BlackMale47.0DiscourtesyWord46.0NaNNoneUnsubstantiated001998-11-012000-06-0112P_4614.72345334.149993
7MichaelCronin3479111998WhiteM30BlackMale47.0Abuse of AuthorityRefusal to provide name/shield number46.0NaNNoneSubstantiated111998-11-012000-06-0112P_4614.72345334.149993
8EricSingle3491111998WhiteM29BlackMale33.0ForceOther115.0NaNNoneUnsubstantiated001998-11-012000-04-0121P_1157.82980126.733794
9GeorgeSullivan3514121998WhiteM28BlackMale14.0ForcePhysical force72.0NaNNoneExonerated101998-12-012000-04-0101P_728.06220332.570689

Last rows

first_namelast_namecomplaint_idmonth_receivedyear_receivedmos_ethnicitymos_gendermos_age_incidentcomplainant_ethnicitycomplainant_gendercomplainant_age_incidentfado_typeallegationprecinctcontact_reasonoutcome_descriptiondisposition_cleanruling_conduct_occurredruling_conduct_violated_rulesreceived_datetimeclosed_datetimeofficer_cumcountcommand_rank_numprecinct_alphapercent_unemployed_meanpercent_nohs_mean
28308DamonMartin43619122019BlackM45BlackFemale59.0Abuse of AuthoritySearch of Premises75.0Execution of search warrantSummonsExonerated102019-12-012020-04-01183P_7511.50349922.763096
28309DamonMartin43619122019BlackM45BlackFemale59.0Abuse of AuthorityEntry of Premises75.0Execution of search warrantSummonsExonerated102019-12-012020-04-01183P_7511.50349922.763096
28310CharlesPowell43592122019BlackM32BlackMale65.0DiscourtesyAction14.0OtherSummonsSubstantiated112019-12-012020-05-0111P_144.1706266.667612
28311CharlesPowell43592122019BlackM32BlackMale65.0DiscourtesyAction14.0OtherSummonsSubstantiated112019-12-012020-05-0111P_144.1706266.667612
28312DavidRamirez43605122019HispanicM26BlackMale43.0Offensive LanguageRace48.0Report of other crimeNoneUnsubstantiated002019-12-012020-05-0141P_4819.26141239.156841
28313DavidRamirez43605122019HispanicM26BlackMale43.0Abuse of AuthorityThreat to damage/seize property48.0Report of other crimeNoneUnsubstantiated002019-12-012020-05-0141P_4819.26141239.156841
28314AnthonyJones43638122019BlackM41AsianMale35.0Abuse of AuthorityThreat of arrest18.0Moving violationSummonsExonerated102019-12-012020-04-0111P_184.3000065.275617
28315TimothySprague43673122019WhiteM32BlackFemale42.0Abuse of AuthorityThreat of arrest24.0Report of other crimeArrestExonerated102019-12-012020-06-0113P_247.25730810.047435
28316RobertObrien4368312020WhiteM35BlackFemale56.0Abuse of AuthorityProperty damaged79.0Execution of search warrantNoneExonerated102020-01-012020-05-0131P_7912.02291119.937815
28317BrendanDono4370312020WhiteM39HispanicMale42.0Abuse of AuthorityVehicle search10.0Other violation of VTLSummonsExonerated102020-01-012020-06-0191P_107.0887285.732158

Duplicate rows

Most frequent

first_namelast_namecomplaint_idmonth_receivedyear_receivedmos_ethnicitymos_gendermos_age_incidentcomplainant_ethnicitycomplainant_gendercomplainant_age_incidentfado_typeallegationprecinctcontact_reasonoutcome_descriptiondisposition_cleanruling_conduct_occurredruling_conduct_violated_rulesreceived_datetimeclosed_datetimeofficer_cumcountcommand_rank_numprecinct_alphapercent_unemployed_meanpercent_nohs_meancount
159JohnParente38526122017WhiteM36BlackMale31.0ForceNonlethal restraining device79.0PD suspected C/V of violation/crime - streetArrestExonerated102017-12-012019-11-0173P_7912.02291119.9378155
267MiroslavMaric3712352017WhiteM35HispanicMale37.0DiscourtesyWord44.0PD suspected C/V of violation/crime - autoArrestUnsubstantiated002017-05-012017-10-0133P_4414.32190735.5443665
352TroyPeacock4209942019BlackM36AsianMale27.0ForceNonlethal restraining device73.0OtherNoneExonerated102019-04-012020-05-0163P_7314.01450425.9896855
3AlejandroValderrama3465832016HispanicM28BlackMale33.0ForcePhysical force77.0PD suspected C/V of violation/crime - streetArrestUnsubstantiated002016-03-012016-09-0141P_7710.08980916.8440644
9AnandaMirandamessner3908632018HispanicF29WhiteFemale65.0ForcePhysical force60.0Report-domestic disputeArrestUnsubstantiated002018-03-012019-05-0121P_6010.03745320.5934463
36BrianLatva3506352016WhiteM32BlackFemale17.0ForceNonlethal restraining device106.0OtherArrestSubstantiated112016-05-012016-11-0103P_1069.72337821.1183693
48ChristianCayenne3764472017BlackM25BlackMale23.0Abuse of AuthorityThreat of force (verbal or physical)72.0OtherNoneUnsubstantiated002017-07-012018-03-0121P_728.06220332.5706893
53ChristopheAylward3671432017WhiteM36OtherMale26.0ForcePhysical force105.0OtherNoneExonerated102017-03-012017-10-0101P_1058.56091613.3026913
64ChristopheSanchez3529962016WhiteM29HispanicMale24.0ForceOther110.0PD suspected C/V of violation/crime - streetArrestUnsubstantiated002016-06-012017-02-0121P_1106.09621432.4225453
98EdwardMcclain4016282018WhiteM34BlackMale44.0DiscourtesyWord44.0PD suspected C/V of violation/crime - streetSummonsUnsubstantiated002018-08-012019-06-0131P_4414.32190735.5443663